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CONVERGENCE ANALYSIS OF APPROXIMATE PRIMAL 
SOLUTIONS IN DUAL FIRST-ORDER METHODS^* 

JIE LU^ AND MIKAEL JOHANSSON^ 


Abstract. Dual first-order methods are powerful techniques for large-scale convex optimization. 
Although an extensive research effort has been devoted to studying their convergence properties, 
explicit convergence rates for the primal iterates have only been established under global Lipschitz 
continuity of the dual gradient. This is a rather restrictive assumption that does not hold for several 
important classes of problems. In this paper, we demonstrate that primal convergence rate guarantees 
can also be obtained when the dual gradient is only locally Lipschitz. The class of problems that 
we analyze admits general convex constraints including nonlinear inequality, linear equality, and set 
constraints. As an approximate primal solution, we take the minimizer of the Lagrangian, computed 
when evaluating the dual gradient. We derive error bounds for this approximate primal solution 
in terms of the errors of the dual variables, and establish convergence rates of the dual variables 
when the dual problem is solved using a projected gradient or fast gradient method. By combining 
these results, we show that the suboptimality and infeasibility of the approximate primal solution at 
iteration k are no worse than 0(l/y/k) when the dual problem is solved using a projected gradient 
method, and 0(1/k) when a fast dual gradient method is used. 
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1. Introduction. Lagrangian duality is a widely-used approach in large-scale 
optimization, especially when there are a few constraints that complicate an other¬ 
wise simple problem [1,2]. Although many hrst-order methods can be applied to solve 
such problems directly in the primal space, the iteration cost can be very high since 
the projection onto the constraint set is often computationally difficult [3]. The cor¬ 
responding dual problem has a more desirable structure: the dual constraint set has a 
simple form and the (sub)gradient of the dual function is relatively easy to evaluate. 
In addition, the dual function is often additive and suitable for distributed imple¬ 
mentation, which has been exploited in a wide range of recent applications, including 
communication systems [4,5], large-scale control [6], and multi-agent systems [7]. 

There are many practical and theoretical subtleties in using dual optimization 
methods to generate optimal solutions to the engineering problems cited above. First, 
one needs to ensure that the dual optimal value agrees with the primal optimal value 
(ie., that there is no duality gap). For convex optimization problems, this can be 
done by verifying Slater’s constraint qualifications [2]. Then, one typically needs to 
guarantee that the iterates generated by the dual optimization method converge to a 
dual optimum, which is not always true. For instance, the subgradient method with 
constant step-size achieves suboptimality only. Further, for most applications it is 
desirable to construct approximate primal solutions (representing the actual decisions 
to implement) from the dual iterates. Whether the approximate primal solutions 
converge to a primal optimal solution or not is often of great practical concern. More- 
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over, to be able to assess solution times and understand how they depend on problem 
data, it is preferable to estimate how quickly the solution converges. This motivates 
research on on-line construction of approximate primal solutions and studying their 
convergence properties. 

A number of results on the convergence properties of approximate primal solu¬ 
tions have been reported in the literature [6,8-14]. At one extreme are results on 
non-smooth convex problems with nonlinear constraints, e.g. [8,9], where the corre¬ 
sponding dual function is also non-smooth in general. For such problems, one typically 
applies the subgradient method to the dual problem and forms running averages of 
the generated primal iterates to construct an approximate primal solution. Such an 
approximate primal solution converges asymptotically to the primal optimal set with 
diminishing step-sizes [8] and has guaranteed bounds on suboptimality and infeasibil¬ 
ity when a constant step-size is used [9]. At the other extreme are problems for which 
the dual function is differentiable and has globally Lipschitz continuous gradient over 
the entire dual feasible set, e.g. [6,10-14]. To ensure differentiability of the dual func¬ 
tion, one often needs to assume strong convexity of the objective function [6,11-13] or 
approach the dual problem using an augmented Lagrangian [10,14]. To make the dual 
gradient globally Lipschitz, the references cited above typically require the inequality 
and equality constraints to be linear. One exception is [13] that allows for nonlin¬ 
ear inequality constraints, but not equality constraints. However, [13] assumes that 
both the objective and the inequality constraint functions are twice differentiable and 
that the Jacobian of the constraint functions is element-wise bounded. The globally 
Lipschitz dual gradient not only simplifies analysis but also allows the application 
of dual gradient and fast gradient methods {e.g., [3,15,16]) that achieve sublinear 
convergence rates for the dual iterates. This leads to sublinear convergence rates of 
the approximate primal solution, be it either the primal iterates [6,10,12] or their 
running average [11,13,14]. 

In this paper, we consider a general class of convex optimization problems that 
covers the less explored middle ground between these two extremes. In particular, 
we focus on a class of convex optimization problems with a strongly convex but 
not necessarily differentiable objective function. We allow the problems to have all 
three types of convex constraints: nonlinear inequalities, linear equalities, and set 
constraints, while the related references [6, 8-14] tackle problems in the absence of 
either nonlinear constraints or equality constraints. This problem class leads to a 
differentiable dual function with locally Lipschitz gradient on the dual feasible set and 
generalizes the problems with globally Lipschitz dual gradient considered in [6,11-13]. 

For this problem class, we consider the unique minimizer of the Lagrangian for 
given dual variables as an approximate primal solution and relate the errors of this 
approximate primal solution in primal optimality and feasibility to those of the dual 
variables in dual optimality. Based on such relationships, we study the convergence 
properties of the approximate primal solution when the dual variables are generated 
from the application of the classical projected gradient and fast gradient methods to 
the dual problem. Specifically, by imposing mild assumptions on the smoothness of 
the inequality constraint functions, we construct a sufficient condition on the step- 
size to guarantee convergence of the dual iterates generated by the projected dual 
gradient method and prove that they converge sublinearly at a rate of order 0{l/k). 
It is worthwhile to mention that this is a new result, as the existing results on the 
0(1/k) convergence rate of the projected gradient method are established under global 
Lipschitz continuity of the objective gradient, while for our problem, the dual gradient 
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is only locally Lipschitz on the dual feasible set. This leads to one of our main results, 
which states that the primal iterates (be., our approximate primal solution at each 
iteration) converge to optimality and feasibility at a rate no worse than 0(1/\/k). 
By assuming boundedness of the subgradients of the inequality constraint functions, 
we show that the fast gradient methods in [15,16] can be applied to solve the dual 
problem and guarantee the 0(1/A:^) convergence rate of the dual iterates. As a result, 
the convergence rates of the primal iterates in both optimality and feasibility are 
improved to 0(l/fe). 

The paper is organized as follows: Section 2 gives a formal problem statement, 
while Section 3 establishes bounds for the error of the approximate primal solution in 
terms of errors of the dual variables. Convergence rate bounds for the dual and primal 
iterates in several dual first-order methods are derived in Section 4. Section 5 uses 
simulations to compare the practical performance of different choices of approximate 
primal solutions in various dual first-order methods. Finally, Section 6 concludes the 
paper. The proofs are in the appendix. 

1.1. Notation. The following notation is adopted throughout the paper: Let 
M" and M" be the set of nonnegative and negative vectors in R", respectively. For 
a vector x G R”, let x^^^ € R, i = 1,2, ...,n denote the fth element of x and 
g R-f“®+i, 1 < i < j < n the vector consisting of the *th, {i + l)th, ..., 
jth elements of x. In addition, for any x,y G R", let max{x, y} be the element-wise 
maximum operation, be., (maxja:, = max{a:^*\ y^*^} Vi G {l,...,n}. We use 

II ■ II; II ■ 111; II ■ Iloo; and II • ||f to represent the Euclidean, £i, infinity, and Frobenius 
norm, respectively. For any matrix A G let Umax (A) = •\/Ai„ax(A'^A) and 

ii^min(A) — max^ yj Amin i,AAA ), AminA)}, where Aniax(’) and Aniin(') represent 

the largest and smallest eigenvalues of a real symmetric matrix. We allow A to have 
zero dimension, be., p = 0 or n = 0, in which cases we let crmax(A) = 0. For any func¬ 
tion h : R" —>■ R, let dh{x) C R" be its subdifferential at x G R”. If h is differentiable 
at X, then dh{x) = {V/i(x)}, where Vh{x) G R" is the gradient of at x and its ith 
element is represented by V*^*)/i(x). For any set Q C R", let relint Q be its relative 
interior, convQ its convex hull, diam((5) its diameter, |Q| its cardinality, and 7 ^q[-] 
the projection onto Q. 

2. Problem formulation. We consider the following optimization problem with 
inequality, equality, and set constraints: 

minimize f(x) 

( 2 \\ subject to ^ 0; f = l,2,...,m. 

Ax + 6 = 0, 
xGX. 

Here, / : R" —>■ R is the objective function, : R" —>■ R, Vf G {1,2,..., m} represent 
the nonlinear inequality constraint functions, A G R^^" and 6 G R^ encode the linear 
equality constraints, and X C R" is a closed and convex set. In addition, let the 
following assumption hold: 

Assumption 1. Problem (2.1) satisfies the following: 

(a) The objective function f is strongly convex over X with convexity parameter 
6» > 0, be., /(y) - /(x) - V/(x)^(y - x) > f ||x - yf, Vx,y G X, VV/(x) G 
df{x)* 


f is not necessarily differentiable. For instance, / could be a quadratic function plus an ii norm. 
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(b) Each g^'^\ i € {1,2,..., m} is convex over X and satisfies a Lipsehitz condi¬ 
tion on X: Wg^^^x) - < L^\\x - y\\ Vx, y G X for some Li > 0. 

(c) There exists x G relint X such that g^''^ (i) <0ViG{l,2,..., m} and Ax-\-b = 

0 . 

(d) The number of inequality and equality constraints is not zero, i.e., m+p 7 ^ 0. 
If p 0, then A is not a zero matrix. 

Assumptions 1 (a), 1(b), and 1(c) guarantee that there is a unique optimal solution 
X* to problem (2.1) and that the optimal value f* = f{x*) is finite. In addition, 
they ensure that problem ( 2 . 1 ) has no duality gap when dualizing the inequality and 
equality constraints, i.e., f* is equal to the optimal value d* of the corresponding dual 
problem, and that the dual optimal set D* is nonempty [2, Prop. 5.3.2]. Note that 
we only require the convexity and Lipsehitz continuity in Assumptions 1(a) and 1(b) 
to hold over X, and not globally over the entire K”. 

To formulate the dual problem of (2.1), we first introduce the Lagrangian function 
£ : M" X —>• M associated with (2.1): 

m rjy 

/:(x,u) = /(x) + ^u(*)y(*)(x)+ {Ax + b). 

i=l 

Given the Lagrangian £, the dual function d : R can be expressed as 

d{u) = min£(x, u) 

x^X 

m j, 

(2.2) =/(x(u)) + ^M«ffW(S(w))+ iAx{u) + b), 

i=l 


where 


x{u) G arg min,j,gj^ £(x. It), 

and the Lagrange dual problem of (2.1) is 

maximize d{u) 

C 2 3 ) uGR”*+p 

subject to u€ D = {u€ M^+p : G Rip}. 

Since d is concave, the dual problem (2.3) is a convex optimization problem. Moreover, 
for every u € D, C{-,u) is strongly convex over X, so x{u) exists and is unique. 
Furthermore, x{u) = x* ii u = u* for some dual optimal solution u* G D* [17, Prop. 
6.1.1]. This makes x{u) a legitimate candidate for an approximate primal solution 
to (2.1) based on the dual variable u € D. 

Next, we establish the boundedness of x{u) and then the differentiability of d: 
Lemma 2.1. Consider problem (2.1) under Assumption 1. Then, for any compact 
set S C D, the set {x(m) : u G S} is bounded. 

Proof See Appendix 7.1. □ 

With Lemma 2.1 and Danskin’s Theorem [2], it can be shown that the dual 
function d is differentiable at every point in D. Moreover, for any u G D, 

Xd{u) = [g‘^'^'> {x{u)),..., g^"^'> {x{u)), {Ax{u) -G b)'^f. 


(2.4) 
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Remark 1. Equations (2.2) and (2.4) suggest that the primal function value at 
x{u), i.e., f{x{u)), can be expressed in terms of the dual variable u G D and the dual 
function d as follows: 

(2.5) f{x{u)) = d{u) — Vd{u)^u, Vm € D. 

This relationship will he essential in the results that we derive shortly. 

In the above setting, the goal of this paper is to (a) quantify how close the ap¬ 
proximate primal solution x(u) is to optimality and feasibility for any given dual 
feasible point u G D, and (b) to derive the convergence rate of x{u) when the dual 
problem (2.3) is solved using some common first-order methods. We investigate opti¬ 
mality in terms of both the distance to the optimizer 

||x(m)-x*|| 

and the error in primal objective value 

\f{x{u)) - f*\ 

while primal infeasibility is captured by the quantity 

m ^<2 

A(x(u)) = (^||Ax(u)-(-y](max{0,5W(^(u))})2^ . 

i=l 


2.1. Comparison with related work. It is instructive to compare Problem (2.1) 
with the problem classes considered in the related works [6,8-14] that also study pri¬ 
mal convergence in dual first-order methods. 

First of all, note that (2.1) allows for all three types of standard convex constraints 
(convex inequality, linear equality, and convex set constraints), while [6,8-14] do 
not. The constraints in [6,10-12,14] must be linear, and although [8,9,13] consider 
nonlinear inequality constraints, they do not allow for linear equality constraints. 

Like our Assumption 1, references [6,11-13] also assume strong convexity of the 
objective function /. Clearly, problem (2.1) generalizes the linearly constrained prob¬ 
lems considered in [6,11,12]. In addition, the problem class with nonlinear inequality 
constraints in [13] requires that the objective and the inequality constraint functions 
are twice differentiable and that the Jacobian of the inequality constraint functions is 
element-wise bounded. These are more restrictive than Assumption 1. 

Strong convexity of the objective function is relaxed to convexity in [8-10,14]. In 
[8,9], the dual function is non-differentiable and therefore only the subgradient method 
can be applied to the dual, which explains the lack of convergence rate gurantees. 
In [10,14], quadratic augmented Lagrangians are used to obtain a differentiable dual 
function. Nevertheless, they still require the constraint set X to be compact. 


3. Primal errors in optimality and feasibility. In this section, we bound 
the errors of the approximate primal solution x(u) G A in optimality and feasibility 
in terms of the errors of the dual variable u G D. To present our first result, we 
introduce the following notation; for any u G U, let 


l{u) 


\/m 1 


(^), 


sup 

q^G{u) 



(3.1) 
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where 


(3.2) G{u) = U dg^^[x{u)) C M". 

i=l 

Since G{u) is a compact set [17, Prop. 4.2.1], 0 < 7 (it) < oo. Also, if 'y{u') = 0 for 
some u' G D, then x{u) = x* \/u G D.^ This means that the primal optimal solution 
X* can be simply found by arbitrarily picking u G D and computing x(u). Hence, in 
the rest of the paper, we exclude this trivial case and assume 7 (rt) > 0 Vu G D. 

Then, consider the following lemma: 

Lemma 3.1. Consider problem (2.1) under Assumption 1. For any u,v G D, 

(3.3) ||i(M) -s(?;)|| < min{ 7 (u), 7 (u)}||u-?;||, 

where j(u),'y(v) G (0,oo) are defined in (3.1). 

Proof. See Appendix 7.2. □ 

Lemma 3.1 allows one to relate the primal error ||x(u) — a:*|| to the dual error 
IIM — M*|| for any u G D and any u* G D*. In addition, the next theorem bounds 
||a:(u) — a:*|| by virtue of the error d* — d{u) in dual optimality. 

Theorem 3.2. Consider problem (2.1) under Assumption 1. For any u G D and 
any u* G D*, 

(3.4) ||x(u)-a:*|| < 7 (m*)||m-m*||, 

(3.5) ||s(u)-x*||<y?^^l^S, 


where 7 (m*) G (0, oo) is defined in (3.1). 

Proof. See Appendix 7.3. □ 

Note that both Lemma 3.1 and Theorem 3.2 do not require the Lipschitz condition 
in Assumption 1(b), which, however, is needed for deriving other results below. 

Having derived bounds on ||a;(M) — a;*||, we turn our attention to the primal error 
\f{x{u)) — f*\. To this end, for any compact subset S C D, define 


(3.6) 


m 

L{S) = sup A) 

«es ^ ^ / 


1/2 


> 0 . 


From Lemma 2.1 and [17, Prop. 4.2.3], the boundedness of S implies that the set 
UugsG(u) is bounded, so L{S) < oo. Next, we show that L{S) is a Lipschitz constant 
of Vd on the compact set S' C H: 

Proposition 3.3. Consider problem (2.1) under Assumption 1. Then, on every 
compact set S C D, Vd satisfies a Lipschitz condition: 


(3.7) ||Vd(M)-Vd(u)|| < L(S)||it-u[|, Vm,uGS, 
where L{S) G (0,oo) is defined in (3.6). Moreover, if S is convex, 

(3.8) d{v) — d{u) — S/d{u)^{v — u) > — 11^ ~ ^ll^i \/u,v G S. 


^To see this, note that 7(«0 — 0 implies p = 0 and {x{u')) = {0} Vi E {1,..., m}. Hence, 
dxC{x(u'),u) = df(x{u')) Vu E Z), where dxC represents the subdifFerential of C with respect to 
the first argument. Since x{u') minimizes C{-^u') over X, there exists V/(x(u.')) E df{x{u')) such 
that ^f{x{u'))'^{x — x{u')) > 0. Therefore, x[u') = arg min^,^^u) = x[u) Vu E D and thus 
x{u) = X* Vu E D. 
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Proof. See Appendix 7.4. □ 

The local Lipschitz continuity of Vd on Z? established in Proposition 3.3 allows 
to guarantee bounds on the primal error |/(x(it)) — f*\ and the primal infeasibility 
A(x(u)) of any x(u) with u in some compact set S C D. The basic idea for deriving 
such bounds is to use (2.5), which gives 

(3.9) f{x{u)) - /* = -Vd{ufu + d{u) - d* 

and then bound —Vd(M)^u using (3.8) with v = u-\- d{u). However, since such 

a V may not belong to S, we introduce the set 

(3.10) = conv^|P 2 :i[M + /3Vd(M)] : u € S, /3 € [0,1/L(5)]|^, 

which is compact and convex. In addition, S C $(S') C D. Hence, ii u € S and we 
let V = Vd[u + u,v G 4>(S') and we can apply (3.8) over $(S'). 

The following theorem provides the formal results: 

Theorem 3.4. Consider problem (2.1) under Assumption 1. Let S G D be 
compact. Then, for any u G S and any u* G D*, 

(3.11) /(x(u)) -/* < ^||u||ooV2l^$(50)(mTp) + \/d*^^^(M), 

(3.12) f{x{u)) -r> -||u*||v/2L(4>(A))(d- - d{u)), 

(3.13) A(S(u)) < ^/2L{^{S)){d* - d{u)), 

where 4)(5') and L($(5')) G (0, oo) are defined in (3.10) and (3.6). 

Proof. See Appendix 7.5. □ 

The bounds provided in (3.11), (3.12), and (3.13) depend on the compact set 
<I)(5') defined in (3.10). Thus, unlike Theorem 3.2, the results in Theorem 3.4 only 
hold locally, which stems from the fact that Vd is locally Lipschitz continuous on 
D. Nevertheless, under the assumption below, similar conclusions can be established 
globally over D: 

Assumption 2. The set UueDG{u) is bounded. 

Assumption 2 can be satisfied when each constraint function gG>, i = 1,2,... ,m is 
affine or the constraint set X is compact (cf. [17, Prop. 4.2.3]). For another example, 
if each is differentiable at every point of X and satisfies the Lipschitz condition in 
Assumption 1(b) on an open set containing X, then || Vg^®^(a;)|l < LiWx G X, which 
implies that Assumption 2 holds. However, if the Lipschitz condition only holds on X 
as in Assumption 1(b), then ||V 5 *^*^(a;)|| may be unbounded on X^ and Assumption 2 
is thus not guaranteed. 

Remark 2. Note that even when both Assumption 1 and Assumption 2 are 
imposed, our results generalize those in references [6,11-13], since we do not require 
f and g^''\ i = 1,2,... ,m to be differentiable. Also note that Assumption 2 is not 
universally imposed throughtout the paper; most results hold without Assumption 2. 

tPor instance, let X = {x S : xPl > 1, = 1}. Also, let gC , R be defined as 

( 2 ) 

^(0(x) = —, which is differentiable, is convex, and satisfies a Lipschitz condition on X. 
However, = 1 + Ina:^^))^ Va: E X, which is unbounded. 
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Under Assumption 2, we have sup„g£) 7 (M) < oo, which leads to the Lipschitz 
continuity of Vd on Z? and the following error bounds: 

Corollary 3.5. Consider problem (2.1) under Assumptions 1 and 2. Then, Vd 
satisfies a Lipsehitz condition on D: ||Vd(M) — Vd(u)|| < L\\u — u|| 'iu,v € D, where 


1/2 

L = sup7(M)(cr^,a^(A) e( 0 ,oo). 

“ec ^ i=i ^ 


Moreover, for any u € D, (3.11)-(3.13) hold with L{^{S)) replaced by L. 

Proof. See Appendix 7.6 □ 

In the final part of this section, we study a special case of (2.1) where all the con¬ 
straints are linear and derive sharper and more explicit primal error bounds. Specifi¬ 
cally, we consider 

minimize fix) 

xGR" ^ ^ 

/g 24 ) subject to A'x + b' < 0, 

Ax -1-6 = 0, 

X G X, 


where A! G b' G M"*, and < represents element-wise inequality. For conve¬ 
nience, let A = [{A')'^,A^Y' G and b = [{b')'^,b'^f G Without loss 

of generality, we assume A is not a zero matrix. 

If / is strongly convex over the whole R", X is a polyhedral set, and the constraint 
set of problem (3.14) is nonempty, then Assumption 1(c) can be removed [2, Prop. 
5.2.1]. Also, Assumption 2 is automatically satisfied for this problem due to the 
linearity of the constraints. Besides, x{u) exists and is unique for any u G R'"+p and 
d is differentiable over R'^+p. 

Following the proof of Lemma 3.1, we show in the corollary below that the distance 
between approximate primal solutions is proportional to that between the correspond¬ 
ing dual variables: 

Corollary 3.6. Consider the linearly constrained problem (3.14) under As¬ 
sumption 1. Then, for any u,v G 

||i(M) - i(u)|| < - u||. 


Proof. See Appendix 7.7. □ 

Since the inequality constraints are linear in (3.14), the bound provided in Corol¬ 
lary 3.6 is independent of u and v. Moreover, it is tighter than that in Lemma 3.1, 
i.e., ^ min{ 7 (M), 7 (u)}. This can be seen from the facts that sup^jg^^^^ ||g|| > 

-j^\\A'\\F> ;^crmax(A') and that criax(^) < CTmax(^') + o-max(^)- When there are 
no inequality constraints, i.e., m = 0, the two bounds are equal. 

Due to the linearity of the constraints, the gradient of the dual function is globally 
Lipschitz continuous over the whole space with the Lipschitz constant [12]. 

Based on this, we provide global error bounds on primal optimality and feasibility: 

Proposition 3.7. Consider the linearly constrained problem (3.14) under As¬ 
sumption 1. Then, for any u G D and any u* G D*, 



- dju)) 

e 


< fi^iu)) - f* < ||w||crmax(A) 



- d{u)) 


(3.15) -l|u*l]a^ax(A) 
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(3.16) 


A(x(m)) < fTinax(^) 



-d{u)) 

e 


Proof. See Appendix 7.8. □ 

Again, it can be shown that the upper bound in (3.15) is not specialized from 
and is tighter than that in (3.11). 

4. Primal convergence in dual first-order methods. In this section, we use 
the connections between primal and dual errors that are built in Section 3 to analyze 
the convergence properties of the approximate primal solution when some common 
hrst-order methods are employed to solve the dual problem. 

4.1. Primal convergence in the projected dual gradient method. We 

first consider the projected dual gradient method. Let the dual iterates (wfcj^o ^ D 
be generated by 


(4.1) Uk+i='PD[uk + C(Vd{uk)], Vfc > 0 


from an arbitrary initial point uq £ D. To derive the convergence rates of (uk)'^^Q 
and we impose the following assumption: 

Assumption 3. Problem (2.1) satisfies the following: 

(a) The constraint functions Vi € {l,2,...,m} are differentiable at every 
point in X. 

(b) There exists u G such that {((i-™) g R™ and L{-,u) is strongly convex 

over X. 

To satisfy Assumption 3(b), it suffices that each satisfies a Lipschitz con¬ 

dition on X with Lipschitz constant L' > 0. To see this, note from the proof of [3, 
Lemma 1.2.3] that for each i G {1, 2,..., m}, g^'^\xi)—g^'^\x 2 ) — yg^^\x 2 )'^{xi—X 2 ) < 
^llxi — X 2 \\'^ Va:i,X 2 G X. Hence, by letting u G R'^+p be such that G R™ and 

— < 9, we have 


£(xi,m) - £(x2,m) - Va;£(x2, £t)^(xi - X 2 ) > 


0 + 


\\x 1 -x 2 f 


for each Xi.,X 2 G X and each subgradient Vx£(x 2 ,u) G dx£(x 2 ,u). Thus, £(-,u) is 
strongly convex over X. Using u in Assumption 3, we define the set 

D = {uG R”®+^® : G R!(®} D D. 


For any uG D, C{-,u) is strongly convex over X and thus x{u) uniquely exists. 

The next lemma is an important step toward establishing the convergence rates 
of {uk)fLo and (x(ufc))^o: 

Lemma 4.1. Consider problem (2.1) under Assumptions 1 and 3. Then, for any 
u G D and v G D, 


||Vd(M)-V(i(T)|| < 


y/mA -1 

9 


m 


2 ^ 


1/2 


i=l 


(4.2) • max|crmax(A), max ||Vg^®)(x(M))||, max ||Vsi^®)(x(t))|||||u - 1 ;||. 


e{l,...,m} 
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Proof. See Appendix 7.9. □ 

The Lipschitz-like property of Vc? on D established in Lemma 4.1 will be used to 
derive further inequalities below. To present these, we need to introduce the following 
additional notation: For any convex and compact set S' C H, let 


(4.3) 4'(S) 


conv ^|u+/3(Vd(M) —V(i(u)) : u,v G S, 
S 


^ P’ ^(S)]}) 


if77(S)>0, 

otherwise, 


where r](S) G [0,oo) is defined by 


(4.4) 


sup max 

u,v^S 


viS) = 





if sup max 

u,ves 




> 0 , 


otherwise. 


This guarantees that 4'(S) is compact and S C 4'(S) C D. The expression of r]{S) 
is admittedly complicated, but it allows us to include the pathological cases that 
V(i(u) is constant over S and that there are no inequality constraints (z.e., m = 0). 
In particular, r]{S) = 0 means the absence of equality constraints {i.e., p = 0) and 
the invariance of V(i(u) on S. In this case, the above definition still guarantees that 
u + r]-\Vd{u) - Vd(u)) G T(5') Vit, u G S' Vr? > r/(S). 

Under Assumption 3, the definitions of G{u), j(u), and L(S) in (3.1), (3.2), 
and (3.6) can be extended to hold for any u G D and any compact set S C D. Also, 
Lemma 2.1 still holds when D is replaced by I?, which implies that 0 < L(4'(S)) < oo. 
Moreover, Lemma 4.1 implies that ||Vd(u) — Vd(u)|| < L(4'(S))||u — ?;|| Vu G S 
Vu G 'I'(S). With these observations, consider the following lemma: 

Lemma 4.2. Consider problem (2.1) under Assumptions 1 and 3. Let S G D be 
eonvex and compact. Also let r](S) G [0, oo), 'I'(S) C D, and L(4'(S)) G (0,oo) be 
defined in (4.4), (4.3), and (3.6), respectively. Then, for any u G S and v G 4'(S), 

(4.5) d{v) — d{u) — Vd{u)^{v — u) > ——^||u — u||^. 


Moreover, for any u,v G S and any r/ > 0 such that rj > r/{S), 


(4.6) d{v) - d{u) - yd{u)'^{v - u) < “ “) “ '^c?(u)||^, 

(4.7) (Vd(u) - yd{v)f{u -v)< - i) ||Vd(u) - Vd(u)f. 


Proof. See Appendix 7.10. □ 

Remark 3. Lemma 4.2 is critical in deriving the convergence rates of the pro¬ 
jected dual gradient method (4.1). Note that Theorem 2.1.5 in [3] gives similar in¬ 
equalities as (4.5), (4.6), and (4.7). However, those inequalities require that Vd is 
Lipschitz continuous over and their proofs do not apply to our case where Vd 

is locally Lipschitz continuous on D. Indeed, as is suggesed by Example 3 below, 
when problem (2.1) reduces to the linearly constrained problem (3.14), Lemma 4.2 is 
specialized to Theorem 2.1.5 in [3]. 
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Having established the inequalities in Lemmas 4.1 and 4.2, we provide the dual 
and primal convergence rates for the projected dual gradient method (4.1): 

Theorem 4.3. Consider problem (2.1) under Assumptions 1 and 3. Let (Mfe)^Q C 
D be a sequence generated by the projected dual gradient method (4.1). Also, let 
u* £ and Dq = {u € D : ||u — m*|| < ||mo~w*||} C D. Moreover, let ^(Dq) C D be 
defined in (3.10), L($(Ilo)) £ (0,oo) in (3.6), rfDif) £ [0,oo) in (4.4), lE'(Zlo) C D 
in (4.3), and L{'i’{Do)) £ (0, oo) in (3.6). If 


(4.8) 


0 < a < 


f L{^{Do))’ 


L{^{Do)) \ 

2r,{Doy 


then for any k > 0, 


if Li'i’iDo)) > v{Do), 
otherwise, 


(4.9) 

(4.10) 

(4.11) 

(4.12) 

(4.13) 


i?n 




\\x{uk) - a:*|| < 


2 i?o 6 » 


-1 \ 1/2 


1 + kRo6p 1 

f{x{uk)) - f* < (||u*|| + ||iio - u 

Rn 


2(m+p)L($(Do))i?o 


1 + kRoSp 


,-i 


1 + kRgSp 1 ’ 

/(xK))-/ 


1/2 


where Rq = d* - d{uo) £ (0,oo), p = (sup^g^.^ ||Vd(u)|| + ||mo - 'u*ll/a)^ G (0,oo), 
and S = 1/a — L('I'(Ilo))/2 £ (0,oo). 

Proof. See Appendix 7.11. □ 

Theorem 4.3 says that under Assumptions 1 and 3 as well as a proper step-size 
choice (4.8), the dual function value at the dual iterates (ufc)^o converges to d* at a 
rate of 0{l/k). Note that this result extends earlier analysis of the projected gradient 
method for functions with globally Lipschitz continuous gradient (e.g., [18]) to a class 
of functions with locally Lipschitz continuous gradient on closed and convex sets in the 
form of D. Moreover, this result implies that the primal iterates (a;(ufc))^Q converge 
at a rate no worse than 0{l/'/k) in primal optimality and feasibility. Furthermore, 
although (4.8) provides a sufficient condition for the range of step-sizes that guarantees 
these convergence rates, it does not explicitly tell how to select a proper step-size. In 
the examples below, we show explicit step-size rules satisfying (4.8) for some important 
problem classes. 

Example 1. Suppose that X is a compact set. Note that ifm^O, 


sup max 
M.ueS^cfi.■■■."*} 


-|V<Cd(u)-V<Cd(t;)| 
-sF)- 


< sup max 
u,veDo 


\g^^\x{u)) - g^^'fjxiv))] 
uM 


(4.14) < sup max -:r 7 ^||a:(u)-a;(?;)|| 

m.ugDo 

< max-7Adiam(A), 
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where the first inequality is due to (2.4) and the second comes from Assumption 1(b). 
Hence, 


vi^o) If V — max 


„ , max-7^ diam(X) 1 . 

6 J 


c(^) 


Also note that 


l{u) < 7 = ^ max{gmax(^), sup max ||V 5 (*)(a;)||}, Vu e D, 

a xexie{l,2,...,m} 


and thus 


m ^/2 

L(vI/(i?o))<i = 7 (fTLx(^)+E ^0 ■ 

2=1 


Since X is compact, we have 0 < /} < cx) and 0 < L < ex. Then, as long as 


(4.15) 


0 < a < 




if L> rj, 
otherwise, 


the step-size a satisfies (4.8). Notice that unlike r]{Do) and L(i’{Do)), the constants 
fj and L can be directly determined from the primal problem. 

Example 2. Suppose, for each i £ {1,2,..., m}, that is Lipschitz continuous 
on an open set containing X with Lipschitz constant Li > 0. Then, || Vg^®^(a;)|| < Li 
'ix £ X Mi £ {1,2,..., m}, which implies that 


7(m) < 7 4 


Vm + 1 


max{crmax(4). 


max 


L.}, 


Mu G D. 


Due to (4.14) and Lemma 3.1, 


sup max 

u,v^S 




< max —• sup 7 (u) • sup ||m —z;|| 

2G{l,...,m} uCiDq u,v^Dq 

< max-T^ 7 diam(Z)o). 


Therefore, 


7(4>o) < 7 - maxl^^s^:^^, max -dmin{Do)\ . 

Also, E(4'(Zlo)) < L = + Y(ii(^i^i) ■ Then, any step-size a satisfying 

(4.15) meets (4.8). Here, the constants f) and L solely depend on the primal problem 
as well as an upper bound on the diameter of the set Dq. 

Example 3. When problem (2.1) reduces to the linearly constrained problem (3.14), 
it becomes a special case of Example 2. In this case. Assumption 3(b) holds for every 
u £ with ufA'-'^l G M™. By taking ufl'-'^l sufficiently small, we can make rj in 

Example 2 equal to af,^y.{A)/6. Thus, r]{Do) < cr^ax(^)/^ 4 o’max(^)/^- This, along 
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with the fact that Vd is Lipschitz continuous with Lipschitz constant im¬ 
plies that 0 < a < (4-8) • This step-size condition coincides with the 

standard one used to guarantee the convergence of the gradient methods when the ob¬ 
jective function has globally Lipschitz continuous gradient [3]. With such a step-size, 
(4.9)-(4.13) hold with L{^{Dq)) = L(4'(Do)) = o’max(^)/^- Also, from (3.15), we 
have a tighter upper bound on the primal convergence rate 


(4.16) 


f{x(uk)) - f* < crmax(^)(||u*|| + ||mo 


« 1 I) 


2i?D 


5-1 


1 + kRoSp 


1/2 


It is known that the projected gradient method is able to converge linearly when 
the objective function is strongly convex [3]. Theorem 3.2 thus suggests that the pri¬ 
mal iterates could achieve linear convergence if the dual function is strongly concave. 
Indeed, if the subgradients of / satisfy a Lipschitz condition on X with Lipschitz con¬ 
stant M > 0, then the dual function for problem (3.14) with A having full row rank 
and X = K” is strongly concave with concavity parameter —6a^^^{A)/M‘^ < 0 [19]. 
Therefore, for any a £ (0, 2M^0/(0^(t^;„(A) -\- M'^a^^^{A)], we have 

||ufc - u*\\ < q^\\uo - u*||. 


where g = ^1 — 

( 


2(y.6o 




AA) \ 


1/2 




AA)+M^al,,,,^{A) J 


AA) 


1/2 


(S^<in(A)+M^<,.AA)y 


when a = 


[0,1). Moreover, 

_ 2M^ff __ 

e^<^Ln(A)-hM^<.AA) 


reaches its minimum 
[3]. 


4.2. Primal convergence in fast dual gradient methods. In this subsec¬ 
tion, we move on to fast dual gradient methods. To the best of the authors’ knowledge, 
all the existing fast gradient methods require that the gradient of the objective func¬ 
tion satisfies a Lipschitz condition on at least the feasible region in order to reach a 
convergence rate of 0(l/k^). Hence, throughout this subsection, we let Assumptions 1 
and 2 hold, so that Vd satisfies a Lipschitz condition on D with Lipschitz constant L 
defined in Corollary 3.5. We also assume that (an upper bound on) sup^^g^, 7 ( 11 ) and 
thus L are known.§ 

We consider the 1-memory fast gradient method in [15] for solving the dual prob¬ 
lem (2.3). To start with, define the following: Let h : R be differentiable on 

an open set containing D and let Q{u, v) = h{u) — h{v) — Vh{v)'^{u — u) Vm G R™+p 
V w G D. Assume h is strictly convex and satisfies Q{u,v) > [ju — ul|^/2 'iu,v G D. 
Also, let £-d{u, v) = —d{v) — Vd{v)'^{u — v) Vu, v G D. For completeness, we provide 
the algorithm below: 

Algorithm 1 (Algorithm 1, [15]). 

Initialization; 

1. Let /3o = 1 and choose uo,wo G D. 

Operation; At each time k > 0; 

2. Choose a closed convex set Lfk Q such that Uk O D* ^ 0. 


Hn Examples 1 and 2, the upper bounds on sup^jg^j 7 ( 11 ) and L {i.e., 7 and L) can be easily 
computed from the primal problem. 
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3. Let Vk = {1 - I3k)uk + l3k'Wk, 

Wk+i = arg (-diu, Vk) + (3kLQ{u, Wk), 

Uk+i = (1 - /3fc)wfc + fikWk+i ■ 

4- Choose Uk+i & D be such that 

£-diuk+i,Vk) + ^\\uk+i - ffcll^ < i-d{uk+i,Vk) + f llwfe+i - 
5. C/ioose/?fc+i < 2/(fc + 3). ■ 

In Algorithm 1, the variables Uk, Vk, and Wk remain in D at all times. One 
simple way to choose Ufe in Step 2 and Uk+i in Step 4 is that Uk 14 D and Uk+i = 
arg min„g£) £_d(u, ?;fe) + LQ{u,Wk)- In this case, if Q(u,v) = ||m — 'y|P/2 , then 
the updates of Wk+i and Uk+i reduce to projected gradient steps Wk+i = Vnlwk + 
Vd{vk)/{PkL)] and Ufe+i = Volvk + ^d{vk)/L], Other options for Uk and Uk+i can 
also be found in [15]. Note that the above algorithm is indeed specialized from the 
more general Algorithm 1 in [15]. This is for the purpose of deriving the primal and 
dual convergence rates in the following proposition; 

Proposition 4.4. Consider problem (2.1) under Assumptions 1 and 2. Let 
(Mfe)^Q C D be a sequence generated by Algorithm 1 and let u* € D*. Then, for any 
k>l, 


(4.17) 

(4.18) 

(4.19) 

(4.20) 

(4.21) 


d* — d{uk) < 

||x(Ufe) -x*|| < 

f{x{uk)) - /* < 


^LQ{u* ,wei) 

(fc + l )2 ’ 

{SLQ{u\wo)d-^f/^ 
k + l 

A||Mfc||oo(8(TO + p)Q{u* 
k + l 


f{x{uk)) - /* > 
A(x(Mfc)) < 


k + l 

L{8Q{u*,wo)y^^ 

k + l 


lLQ{u*,wo) 
(fc + l )2 


where L is defined in Corollary 3.5. 

Proof. See Appendix 7.12. □ 

Proposition 4.4 says that Algorithm 1 yields 0(l//c^) convergence rate of (ufe)^i 
in dual optimality. In addition, the distance between x{uk) and x* as well as the primal 
infeasibility of x{uk) vanishes at a rate no worse than 0{l/k). As Algorithm 1 does 
not guarantee that {uk)'^i is bounded, it says nothing about the convergence rate of 
f{x{uk)). Nevertheless, if problem (2.1) has only inequality constraints, then the dual 
optimal set is bounded [9] and so is {uk)^^. This leads to the following proposition, 
which states that in the absence of equality constraints, f{x{uk)) converges to /* at 
a rate 0{l/k) after some finite time: 

Proposition 4.5. Consider problem (2.1) under Assumptions 1 and 2. Suppose 
p = 0. Let {uk)'^Q C D be a sequence generated by Algorithm 1. Also, let x € K” 
satisfy Assumption 1(c), u* € D*, and u G D\D*. Then, 

f(-( .i, ^ L{d{u)- f{x)){8[m + p)Q{u*,wo)Y^'^ ALQ{u*,wo) 

^ “ (fc + I)(maxig{i^2,...,m}5^*H5)) {k + iy 

w; ^ /4Lg(u*,u;o) V /2 
d^-d{u) ) ■ 


(4.22) 
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Proof. See Appendix 7.13. □ 

Remark 4. In addition to Algorithm 1, i.e., the 1-memory fast gradient method 
in [15], the dual problem (2.3) can also be solved by the oo-memory fast gradient 
method in [15], which would produce similar primal and dual convergence rates as 
those in Propositions 4-4 4-5. Due to space limitation and since such analyses 

are very similar to that in Propositions 4-4 o,nd 4-5, we omit this algorithm in the 
paper. 

Recall that for the linearly constrained problem (3.14), the dual function d is 
differentiable and has globally Lipschitz continuous gradient with Lipschitz constant 
^max(^)/^- case, the following algorithm, which has a simpler form than 

Algorithm 1, can be adopted to solve the dual problem: 

Algorithm 2 (Algorithm 2, [15]). 

Initialization.' 

1. Let /3o = /3-1 = 1 and choose uq = w_i £ D. 

Operation.' At each time k > 0; 

2. Choose a closed convex set Uk C such that Uk O D* 0. 

3. Let Vk = Uk + /3fc(l//3fc-i - l){uk - Uk-i) and 

Uk+i = arg min„gy^n£,£_d(u,'yfc) + - Vkf. 

4 . C/ioose/?fe+i < 2/(fc + 3). ■ 

If we pick UklA D, then the update of Ufc+i in Step 3 is a projected gradient step 

itfe+i = Voivk + Vd(i;fc) 0 /cr^ax(^)]' If we also choose Pk+i = {\/Pt + - Pt) / 2 , 

then Algorithm 2 becomes the fast iterative shrinkage-thresholding algorithm (FISTA) 
in [16] applied to solve the dual problem. The primal and dual convergence rates of 
Algorithm 2, which have the same order as Algorithm 1 but have a more explicit form, 
are given below: 

Proposition 4.6. Consider the linearly constrained problem (3.14) under As¬ 
sumption 1. Let (Mfc)^Q C D be a sequence generated by Algorithm 2 and u* £ D*. 
Then, for any k > 1, 


(4.23) 

(4.24) 


d* - d{uk) < 


||a;(ufc) - a;*|] < 


2 o'max(^)lko-^ 1 l' 

0 {k + iy 

2 o'inax(A)||uo - M*|| 

e{k + i) 


^4 25) 2||u*||crg^,^,,(i)||Ro-M*|| ^ - /* < - U*|| 


0{k + \) 

(4.26) A{x{uk)) < 

Moreover, if p = 0, then 


9ik + l) 


2o'max(^)lko -U*\\ 

9{k + l) 


fixiuk)) - /* < 


(4.27) 


2o-Lx(^)(c^(^) - fix))\\uo - 

6i(fc-h 1) maxig{i_2,...,m} 

Vfc > Crmax(A)||lto - M*|| 


29 


-1 \l /2 


d* — d(u) 


where x and u are defined as in Proposition 4-5. 
Proof. See Appendix 7.14. □ 
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The 0{l/k) primal convergence rates in (4.24) and (4.26) for linearly constrained 
problem (3.14) are also provided in [6,12]. Moreover, as is shown in [12], can 

be replaced by ||A|| 2 ,oo = {||^a;|]oo : ||a:|| = !}• 

Compared with the projected dual gradient method (4.1), the fast dual gradient 
methods considered above are capable of increasing the primal convergence rates from 
0{\/^fk) to 0{l/k). However, the problems that these methods can handle must sat¬ 
isfy Assumption 2, which is not necessary for projected dual gradient method. On the 
other hand, in order to guarantee the sublinear dual and primal convergence rates, 
the projected dual gradient method has to satisfy Assumption 3 while the fast dual 
gradient methods do not. Moreover, the fast dual gradient methods are more com¬ 
plicated to implement—in addition to solving ki{x,Uk) for constructing the 

approximate primal solution x{uk) that is also needed in the projected dual gradient 
method, they have to solve vahix^x ki(x,Vk) in order to compute 'S/d(vk)- 

5. Numerical example. In this section, we compare the dual and primal con¬ 
vergence performance of the dual first-order methods in Section 4 and the double 
smoothing method [10] via a numerical example. 

We consider the following model predictive control (MFC) problem, which has a 
very similar form as the one formulated in [6]: 

minimize fix) = hx"^ Hx + t^x + "i\\Px — s|| i 

sGR" ' ^ ^ 

subject to Aix -I- &i < 0, 

A 2 X -j- — 0, 

a: e {y G R” : |yW| ^ri^i e {1,2,... ,n}}, 

where H G R"^” is positive definite, t G R", 7 > 0, P G s G R"^, Ai G R™^", 

bi G R™, A 2 G R^^", bi G R^, ri > 0, all of which are randomly generated with 
n = 10, = 5, m = 3, and p = 2. Note that such a linearly constrained problem 

belongs to the intersection of the problem classes that the projected dual gradient 
method (4.1), the fast dual gradient methods in Algorithms 1 and 2, and the double 
smoothing method in [10] can handle. Also, since Algorithm 1 has similar convergence 
rates as Algorithm 2 for this problem, we omit Algorithm 1 to be able to visualize 
the results better. 

For the projected dual gradient method (4.1), we choose the step-size 
a = 2Ai„in(P)/Amax(Af Ai +A 2 A 2 ) X 99%, which satisfies the step-size condition 
in Example 3. For the fast gradient method in Algorithm 2, we choose the parame¬ 
ters Uk and Pk to be such that this method reduces to FISTA [16]. For the double 
smoothing method [10], since the linear constraint of the problem class that this 
method can handle is in the form of Ax G T with A being a linear operator and T 
being a compact set, we put A = A 2 , T = {& 2 }, and view (y G R" : Aiy + bi < 
0, < ri Vi = 1,2,... , n} as its set constraint. Moreover, since one smoothing 

parameter in the double smoothing method relies on an upper bound on some dual 
optimum, we adopt its practical implementation version in [10], which starts with an 
initial guess of this upper bound and repeatedly applying the method to a sequence 
of doubly smoothed dual problems with increasing guess on the upper bound until a 
correct guess is achieved. We choose the desired accuracy e of the method to be 0.05. 

In addition to the approximate primal solution x{uk) studied in this paper, we 
also consider in the simulation the average Xk — '^i^Qx{u()/k of the primal iterates 
as in [9] for the projected dual gradient method and FISTA, and the running average 
/iJ2t=o the weights being l/Pi as in [11] for FISTA. 
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Fig. 1. Primal optimality of approximate primal solutions and dual optimality of dual iterates 
(The light grey, grey, and black dashed curves represent the dual optimality d* — d{u}f) of up; in 
the projected dual gradient method, FISTA, and the double smoothing method, respectively. The 
light grey, grey, and black solid curves represent the primal optimality \f{x{u).)) — f*\ of xiupf) in 
the projected dual gradient method, FISTA, and the double smoothing method, respectively. The 
light grey and grey dotted curves represent the primal optimality \ f(xff) — /*| of in the projected 
dual gradient method and FISTA, respectively. The grey dash-dotted curve represents the primal 
optimality \f{xk) ~ f*\ of x^. in FISTA.). 


For the double smoothing method, Uk is a sequence generated by a fast gradient 
method in [3, Sec. 2.2.1] applied to the smoothed dual; the approximate primal 
solution x{uk) for this specific problem is the unique minimizer of the Lagrangian of 
the original problem. 

Figure 1 compares the convergence of the dual iterates generated by the three 
methods mentioned above, alongside with various choices for the approximate primal 
solution. Generally speaking, the dual convergence rate is faster than the primal in all 
of the three methods. Also, the primal iterates have faster convergence than their av¬ 
erages in the projected dual gradient method and FISTA. The projected dual gradient 
method converges slower than the other two methods in dual and primal optimality. 
FISTA and the double smoothing method have comparable performance, but the dou¬ 
ble smoothing method has the drawback that it only guarantees a prespecified target 
accuracy and does not ensure asymptotic convergence. 

6. Conclusions. This paper studied primal convergence properties of dual first- 
order methods for solving optimization problems with a strongly convex objective 
function and general convex constraints including nonlinear inequality, linear equal¬ 
ity, and set constraints. The unique minimizer of the Lagrangian at the current dual 
iterate, which is needed for evaluating the dual gradient, was considered as an ap¬ 
proximate primal solution. The errors of this approximate primal solution, both in 
optimality and feasibility, were related to the dual errors. Sublinear dual and primal 
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convergence rates for the projected dual gradient method and a few fast dual gradient 
methods were established. 

It is notable that this is the first work that ever provides primal convergence 
rates of dual first-order methods for nonlinearly constrained convex optimization with 
locally Lipschitz dual gradient. This work may also bring insights to future research on 
the convergence performance of other approximate primal solutions such as running 
averages of the primal iterates in dual first-order methods. 

7. Appendix. 

7.1. Proof of Lemma 2.1. Let S C D he compact, c = max„gs d{u), and c = 

min„g 5 d(u), which are bounded due to the continuity of d. Also let dxC denote the 
subdifferential of C with respect to the first argument. Fix uq € S. Since the function 
£(-,uo) is strongly convex over X, the sublevel set C = {s £ A : £(a:, uq) < c} is 
nonempty and compact. Thus, s = max^j^c -I-||Aa;-|-6|p G [0,oo). Also 

let r = max„gs \\u — uo|| £ [0, oo). Then, for any u € S and any x € C, 

(7.1) C{x, Uq) — r^/s < L{x, u) < C(x, uq) + r^. 

To prove the boundedness of {x{u) : u £ S'}, assume to the contrary that it is 
unbounded. Thus, given xq £ C, there exists u' G S such that ||5;(u') — XqW^ > 
2 {C(xo,uo)+r^-c) ^ Since C{-,u') is strongly convex over X with convexity pa¬ 
rameter 9 and since there exists a subgradient Xx£{x{u'),u') £ dx£{x{u'),u') satis¬ 
fying Vx{£{x{u'),u'))'^{x — x{u')) > 0 Vx £ A, we have C{xq,u') — c> C{xo,u') — 
C{x(u'),u') > ^\\x(u') — a;o||^. This gives C{xo,u') > C{xq,uq) +ry/s, which contra¬ 
dicts (7.1). Therefore, {x{u) : u £ Sj is bounded. 

7.2. Proof of Lemma 3.1. Let u,v £ D. From [20, Theorem 3, Sec. 7.1.2], 
there exist subgradients Vx£(a;(M), It) £ dx£{x(u),u) and Va;£(a;(u), u) £ dxC{x{v),v) 
such that 

(7.2) Vx£{x{u),u)'^{x{v) — x{u)) > 0, 

(7.3) Xx£ix{v),v)"’"{x{u) — x{v)) > 0. 

Due to [17, Prop. 4.2.4], dxC{x,u) = df{x) + E™ i 

Vx £ R" Vm £ D. Thus, there exist subgradients Vf{x{u)) £ df{x{u)), V/(x(u)) £ 
df{x{v)), and X{x{u)) £ dg^''\x{u)), Xg^'^\x{v)) £ dg^''\x{v)), Vi £ {1, 2,..., m} 
such that 


XxC{x,u) = Xf{x{u)) + ^ u(*)V 5 «(x(u)) + 

i=l 

m 

XxC{x,v) = Xf{x{v)) + ^v^'^Xg^'^{x{v))+ 

i=l 

Adding (7.2) and (7.3) and substituting the above into the resulting inequality, we 
obtain 

(Xf{x{u)) - Xf{x{v))Y{x{u) - x{v)) < (u^^+^--^+p) - A{x{v) - x{u)) 

m rp 

(7.4) + g^'^\x{u)) — (:r(x;) — x{u)). 

2=1 
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Note that 

^^(m+l-.m+p) _ yim+l-.m+p)^ 

< \\A'^{u^"^+^'-^+P'> - ,;(-+i^-+P))|| . ||S(„) - x(^;)l| 

(7.5) < a^,^{A)\\x{u) - s(^;)|| • \\u^--+^--^+p) - ^(-+i-+p)||. 

In addition, 

m j, 

{x{u)) — v^^^Vg^'^\x{v))^ {x{v) — x{u)) 

i^l 

m rp 

= ^ ^g‘^^\x{u)) — Vg^'^\x{v))j ix{v) — x(u)) 

i=l 

+ g‘'^\x{v))'^ {x{v) — x{u)) 

m 

(7.6) < ^ ||Vg^*)(x(n))|| • ||x(u) - x(t;)|| • |mW - i;W|, 

2^1 

where the inequality holds because for each i € {1,2 ,... ,m}, g^'^'> is convex and 
> 0. From (7.4), (7.6), (7.5), and the strong convexity of / over X, we have 


e\\x{u) - i(u)f <(^ ||Vg«(x(r;))|| • \\x{u) - x(r;)|| • - u«|) 

2^1 

+ a„,ax(^)||x(M) - x(u)|| • ||m(®"+i-+p) - ^(-+1-+P)||, 


which yields 

(7.7) 

I m 

||x(u)-x(u)|| < -((^ sup||g||>«-^;W|)+u„,ax(A)||u(™+'^-+P)-u(™+'^-+P)||) 


< — max 
9 


m 

{a„,ax(7l), sup ||g||}((V|MW-z;«|) 

qeG{v) / 


i=l 


< -max{CTmax(^), sup Ilg||} 


qeGiv) 


m 

(m + 1)((X1 1“^'^ “ - i;(™+l^™+P)||2^ 


. 1/2 


2=1 


= 7(u)||m-u||. 


Since u and v are interchangeable, (3.3) is satisfied. 

7.3. Proof of Theorem 3.2. Let u £ D and u* € D* . Then, by letting v 
in Lemma 3.1 be u*, we get (3.4). To derive (3.5), note that C{-,u) is strongly 
convex on X with convexity parameter 9. Also note that there exists a subgradient 
VxC{x{u),u) G dxC{x{u),u) such that XxC{x{u),u)^{x — x(u)) > 0 Vx € X. As a 
result, |||x(it) —a;*|p < £(x*,u) — £(x(u),u) — VxjC(x(u),u)^(x* — x(u)) < £(x*,u) — 
£{x(u),u). From the Lagrangian saddle point theorem [17, Prop. 6.2.4], £{x*,u) < 
£{x*,u*) = d*. Also, £{x{u),u) = d{u). Therefore, (3.5) holds. 
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7.4. Proof of Proposition 3.3. Let S C D he compact. Due to Lemma 3.1, 
for any u,v € S, 

m 

||Vd(u) - yd{v)\\^ = ||A(x(u) - ^ ||gW(x(w)) - gW(x(n))f 

m 

< crLx(^)iis(w) - 

Z^l 
m 

sup7^M(o-Lx(^) ^11^- 

weS ^ / 


< 


i=l 


Therefore, (3.7) holds. In addition, if S is convex, the proof of [3, Lemma 1.2.3] can 
be applied to obtain (3.8). 

7.5. Proof of Theorem 3.4. Let S' C D be compact, u G S, and u* € D*. 
From (3.9), we know that to find an upper bound on f(x(u)) — f* in terms of d* — d{u), 
it is sufficient to do so to — Vd(u)^it. 

Let A{u) = {i G {1, 2,..., m} : + d{u) < 0} andl('u) = {1,2,..., m+ 

p}\.4(m). These sets identify the components of the dual vector u for which, after a 
gradient step, the projection onto D will be active and inactive, respectively. Since 
-\7d{u)'^u = below we derive 

upper bounds on J2ieA{u) -Vb)(i(n)nb) and J2iei{u) -Vb)d(u)uW. 

To this end, let = [0,oo) for i = 1,2, ...,m and = M for i = m + 

1,... ,m + p and note that (P_D[f])b for all V G ]R”^+P and all i. In this 

way. 


• + I(t(S»V ‘ d(u)l - 1^,., ^ 

Also, since $(S) D S, we have L($(S)) > L(S) and thus 'Pd[u+ -jj^(^^d{u)] 
<I)(S). It follows from (3.8) that 


Also, since ‘P(S) D S, we 1 
<I)(S). It follows from (3.8) 

d* - d{u) > d{VD[u + X(^(^Vd(M)]) - d{u) 


(7.8) 


I 

Vd{u)'^{vD[u+jjj^Vd{u)]-v)j - u-Vd[u+ jj^i^Vdiu)] 

now look at the right-hand side of (7.8). For each i G A{u), 
V«d(n)(p^(.,[nW + - nW) 

> - V«d(n)nW _ = _iv«d(n)n«, 

where the inequality is due to 0 < ub) < —L^vb)d(u) Vi G A( 
i G I{u), 

VWd(u)(p^„[irb) + ^^vb)d(n)] - 

[Jb + _^v«d(n)])' 
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2L($(5)) 


(vWd(M))^ 


Thus, (7.8) gives 

(7.9) d* - d{u) > i ( E -V«d(-)««) + E (V«d(«))^ 


This lead to 

(7.10) 

Moreover, notice that 


i^A{u) ' i^X{u) 


E -VWd(M)MW < 2{d* - d{u)). 

i^A{u) 


^ ( V <->^ W )’^^( E lv«^Wl)E;;^(^ E - V «. W «->)1 


i^X{u) 


i^X{u) 


GX(u) 


Also note that for each i G A{u), V^^^d{u) < 0 and thus — V^®^d(u)M^*^ > 0. It then 
follows from (7.9) that 


E -V«d(w)w« < \\u\\^^y2LmS)){m+p){d* - d{u)). 

i^X{u) 

Due to this, (7.10), and (3.9), we conclude that (3.11) holds. 

Next, we derive a lower bound on/(a;(u))—/*. Since £(a;(M), it*) > C{x{u*),u*) = 

r, 


i^l 

m 

i^l 

(7.11) >-||ii*|lA(x(ii)). 


To bound A(a;(it)), note from (2.4) that if g^'^'>{x{u)) > 0, then i € I{u). In addition, 
{m + 1,...,m+p} C I(?t). Thus, (A(a;(M)))^ < ^jg 2 :(«)It follows from 
(7.9) that (3.13) holds. Due to (7.11) and (3.13) , we obtain (3.12). 

7.6. Proof of Corollary 3.5. Since for any compact S C D, L{^{S)) < L, the 
corollary follows from Theorem 3.4. 

7.7. Proof of Corollary 3.6. Let u,v G K"*+p. Note that for problem (3.14), 
(7.4) is equivalent to 

(yf{x{u)) — yf{x{v))j {x{u) — x{v)) < (u — v)^A(x(v) — x(u)). 

Additionally, similar to (7.5), we obtain 

(it - v)'^A(x(v) - x(it)) < crmax(A)||x(it) - :r(i>)|| • ||it - i;||. 


Then, it follows from the strong convexity of / on X that ||a;(it) —a;(v)|| < 2 i 2 £|(Ai||y_ 

i;||. 
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7.8. Proof of Proposition 3.7. Let u € D and u* € D*. We first prove 

that (3.16) is satisfied. Note that (A(a:(u)))^ < ||(Ax(m) + 6) — (Ax* + 6)|p = 
||Vd(M) —V(i(u*)|p. Further, due to the inequalities (i(u) — d(u*) — Vd(M*)'^(u — M*) < 
— —Vd(M) — Vd(u)*|p [3, Theorem 2.1.5] and {u — u*) < 0, we obtain 

||Vd(u) - - d{u*) - yd{u*f{u - u*)) 

0 

0^2 ( A\ 

(7.12) < \ d{u*)-d{u)). 

Consequently, (3.16) holds. From (3.9), we have f{x{u)) — f* < —Vd(u)^u + 
Vd{u*)'^(u — u*). This, along with Vd{u*Y'u* = 0 [17, Prop. 6.1.1], implies that 

f{x{u)) -f*< (Vd(M*) - Wd{u)fu < |]m|| • ||Vd(M*) - Vd(M)||. 

It follows from (7.12), (7.11), and (3.16) that (3.15) holds. 

7.9. Proof of Lemma 4.1. Let u G D and v G D. We first show that 

llx(u) - x(u)|] < - u|] 

(7.13) • max|crniax(^), max || Vg(*)(S(M))|], max || Vg(*)(x(u))|| |. 

To prove (7.13), consider two mutually exclusive and exhaustive cases: 

Case (i) v G D. In this case, from (3.3), we obtain (7.13). 

Case (ii) v G D — D. Let I~ = {i G {1,2,..., m} : < 0} ^ 0 and 

w = VdIv], i-e., = 0 Vi € /“ and Vi G (1,2,..., m + p}\I~. Notice 

from the proof of Lemma 3.1 that under Assumption 3, (7.7) holds with u = w, which 
gives 


|]a;(u;) -x(u)|] 

^ m 

2 = 1 

Due again to (7.7), we obtain 
||x(u) -x(u;)|| 

.. m 

<^(E I|V5^*Hx(u))| 1 • |u« - u;«| + a„,ax(A)||u(’"+i^™+P) - u;(-+i^™+p)||) 
2 = 1 

=^(E iiv5«(s(«))h« +EiiVff^*Hs(«))ii • 

2e{l,2,...,m}\/“ 

(7.15) +a„,ax(A)|]u(™+i^'"+P) 

From (7.14), (7.15), and the fact that Vi G I , we have 

||i(M) - x(u)|] < ||x(u) - x(w)|| + ||x(w) - ^(^')|| 
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m 

<-(^max{||Vg«(x(u))||,||Vg«(x(^))||}|u«-z;«| 

+ CTn,ax(^) ||) . 

Akin to the last part of the proof of Lemma 3.1, it can be shown that (7.13) holds for 
this case. Having proved (7.13), we now show that (4.2) holds. Due to Assumption 3, 
Lemma 2.1 and (2.4) still hold when D is replaced by D. Thus, using the proof of 
Proposition 3.3 and (7.13), we have (4.2). 

7.10. Proof of Lemma 4.2. Let u G S and v € 4>(S'). Since 4'(5') is convex 
and S C 4'(5'), u + t{v — u) G '^{S) Vt G [0,1]. Then, from Lemma 4.1, 

||V(i(u + t(v — u)) — Vd(M)|| < rL(4'(S'))||M — uH, Vr G [0,1]. 

It then follows from the proof of [3, Lemma 1.2.3] that (4.5) holds. 

Next, we prove (4.6) and (4.7). Let u,v G S. Note that if r]{S) = 0, then 
Vd(M) = Vd(u). Thus, due to the concavity of d, (4.6) and (4.7) hold. Now assume 
rjiS) > 0. To prove (4.6), we utilize the idea from the proof of [3, Theorem 2.1.5]. 

We define a function cj) : 4'(S') —>• K such that (j){'w) = d{w) — Vd{u)"^'w. Note that 
4) is concave and V4>(it) = 0, which implies > (l){w) Vw G 'l'(5'). In addition, 

V + iV^(u) = r; + i(Vd(u) — Vd(u)) G 'I>(S') V 77 > ri{S). It follows from (4.5) that 

(j){u) > (j)(v + —\7(j){v)) 

1 

= d{v + -^(j){v)) - d{v) - Wd{vf-W(j){v) + (j){v) + (Vd(u) - Vd{u)f-V(t){v) 

V V V 

> + Hv) + -||V^(u)f, Vr? > v{S), 

2 77 77 

which is equivalent to (4.6). Moreover, (4.7) can be obtained from (4.6) by inter¬ 
changing u and V and adding the two inequalities. 

7.11. Proof of Theorem 4.3. Let u* G D* and a satisfy (4.8). We first prove 
that Uk G Dq Vfc > 0 by induction. Clearly, uq G Dq. Suppose Uk G Dq for some 
fc > 0. Then, for any 77 > 0 such that 77 > r]{Do), 

\\uk+i - = \\VD[uk + aVd{uk)] - Vd[u* + aVd{u*)]f 

< \\uk + aS/d{uk) — u* — aVd(M*)|p 

= \\uk - + 2a{Vd{uk) - Vd{u^)Y{uk - u*) + a^\\Vd{uk) - Vd(M*)f 

- ■ ^) + + Wuk - w1|^ 

where the last inequality is due to (4.7) and u* G Dq. Notice that over the set 
{77 > 0 : 77 > 77 ( 110 )}, if L{'^{Dq)) > ri{Do), then achieves its mini¬ 
mum — at 77 = L('I'(Do)); otherwise, 4^ ^^*^^^°^^ — reaches the minimum 

at 77 = 77 (Do) > 0. Hence, (4.8) leads to < 

0. Therefore, ||itfe+i — 7 t*|| < Ijufc — 7 t*|], which means that Uk+i G Dq. This com¬ 
pletes the proof by induction. With this property, we now prove (4.9). Because 










24 


of Proposition 3.3 and because Dq C 4'(£)o), Vd satisfies a Lipschitz condition on 
Dq with Lipschitz constant no more than L(4'(i4o)). It then follows from the proof 
of [18, Theorem 5.1] that 

d{uk) - d{uk+i) < -S\\uk+i - Ufcll^, 

d* - d{Uk) < y^\\Uk+l - Uk\\. 

Also, from (4.8), we have a < and thus d > 0. Consequently, 

d* - d{uk+i) <d*- d{uk) - -{d* - d{uk)Y■ 

P 

Then, from [20, Lemma 6, Sec. 2.2], (4.9) holds. Also, (4.9) and (3.5) give (4.10). 
Moreover, note that j]Mfel| < jjM*]] + Huo — m*]] Vfe > 0. Then, (4.11) holds due to this, 
(3.11), and (4.9). Furthermore, (4.12) comes from (4.9) and (3.12). Finally, (4.9) and 
(3.13) yield (4.13). 

7.12. Proof of Proposition 4.4. Inequality (4.17) is derived in the proof of [15, 
Corrolary 1], which, along with (3.5) in Theorem 3.2, gives (4.18). In addition, from 
Corollary 3.5, we obtain (4.19), (4.20), and (4.21). 

7.13. Proof of Proposition 4.5. From [9, Lemma 1], for any u G iu' > 0 : 

d{u') > d{u)}, 


I|m||oo < ||m|| < 


d{u) - fix) 

maxjg{i_2,.,.,m}5^*^ (i)' 


~ 1 /2 

Also, due to (4.17), we have d{uk) > d{u) when k + I > and fc > 1. 

It then follows from (4.19) that (4.22) holds. 

7.14. Proof of Proposition 4.6. Inequality (4.23) comes from [15, Corollary 2] 
and Proposition 3.7. Moreover, because of (3.5), (3.15), and (3.16), we obtain (4.24), 
(4.25), and (4.26). Similar to the proof of Proposition 4.5, we have d{uk) > d{u) if 

/ n1/2 

k + l> crniax(A)l]-«o - M*|| J fc > 1, implying that (4.27) holds. 
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